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Abstract 

A sensor network is a collection of wireless devices that 
are able to monitor physical or environmental condi- 
tions. These devices (nodes) are expected to operate 
autonomously, be battery powered and have very lim- 
ited computational capabilities. This makes the task of 
protecting a sensor network against misbehavior or pos- 
sible malfunction a challenging problem. In this docu- 
ment we discuss performance of Artificial immune sys- 
tems (AIS) when used as the mechanism for detecting 
misbehavior 

We show that (i) mechanism of the AIS have to be 
carefully applied in order to avoid security weaknesses, 
(ii) the choice of genes and their interaction have a pro- 
found influence on the performance of the AIS, (iii) 
randomly created detectors do not comply with limi- 
tations imposed by communications protocols and (iv) 
the data traffic pattern seems not to impact significantly 
the overall performance. 

We identified a specific MAC layer based gene that 
showed to be especially useful for detection; genes 
measure a network's performance from a node's view- 
point. Furthermore, we identified an interesting com- 
plementarity property of genes; this property exploits 
the local nature of sensor networks and moves the bur- 
den of excessive communication from normally behav- 
ing nodes to misbehaving nodes. These results have a 
direct impact on the design of AIS for sensor networks 
and on engineering of sensor networks. 

1 Introduction and Motivation 

Sensor networks ||2TI can be described as a collection 
of wireless devices with limited computational abilities 



which are, due to their ad-hoc communication manner, 
vulnerable to misbehavior and malfunction. It is there- 
fore necessary to support them with a simple, computa- 
tionally friendly protection system. 

Due to the limitations of sensor networks, there has 
been an on-going interest in providing them with a pro- 
tection solution that would fulfill several basic criteria. 
The first criterion is the ability of self-learning and self- 
tuning. Because maintenance of ad hoc networks by a 
human operator is expected to be sporadic, they have 
to have a built-in autonomous mechanism for identify- 
ing user behavior that could be potentially damaging to 
them. This learning mechanism should itself minimize 
the need for a human intervention, therefore it should be 
self-tuning to the maximum extent. It must also be com- 
putationally conservative and meet the usual condition 
of high detection rate. The second criterion is the ability 
to undertake an action against one or several misbehav- 
ing users. This should be understood in a wider con- 
text of co-operating wireless devices acting in collusion 
in order to suppress or minimize the adverse impact of 
such misbehavior Such a co-operation should have a 
low message complexity because both the bandwidth 
and the battery life are of scarce nature. The third and 
last criterion requires that the protection system does 
not itself introduce new weaknesses to the systems that 
it should protect. 

An emerging solution that could facilitate implemen- 
tation of the above criteria are Artificial immune sys- 
tems (AIS). AIS are based on principles adapted from 
the Human immune system (HIS) ifTSi l5l [T? I ; the ba- 
sic ability of HIS is an efficient detection of potentially 
harmful foreign agents (viruses, bacteria, etc.). The 
goal of AIS, in our setting, is the identification of nodes 
with behavior that could possibly negatively impact the 
stated mission of the sensor network. 
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One of the key design challenges of AIS is to define 
a suitable set of efficient genes. Genes form a basis for 
deciding whether a node misbehaves. They can be char- 
acterized as measures that describe a network's perfor- 
mance from a node's viewpoint. Given their purpose, 
they must be easy to compute and robust against decep- 
tion. 

Misbehavior in wireless sensor networks can take 
upon different forms: packet dropping, modification of 
data structures important for routing, modification of 
packets, skewing of the network's topology or creating 
ficticious nodes (see 1 13 1 for a more complete list). The 
reason for sensors (possibly fully controlled by an at- 
tacker) to execute any form of misbehavior can range 
from the desire to save battery power to making a given 
wireless sensor network non-functional. Malfunction 
can also be considered a type of unwanted behavior. 



2 Artificial Immune Systems 
2.1 Background 

The Human immune system is a rather complex mech- 
anism able to protect humans against an amazing set of 
extraneous attacks. This system is remarkably efficient, 
most of the time, in discriminating between self and 
non-ieZ/ antigenslJ A non-self antigen is anything that 
can initiate an immune response; examples are a virus, 
bacteria, or splinter. The opposite to non-self antigens 
are self antigens; self antigens are human organism's 
own cells. 



NEW 
STRINGS 



SELF 
STRINGS 



GENERATE 

RANDOM 

STRING 



MATCH 


NO 


DETECTOR 




SET 


YES 







Figure 1: T-cell (detector) generation by random- 
generate-and-test process. A (bit) string representation 
is assumed. 
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Figure 2: Recognizing non-self is done by matching T- 
cells (detectors) with suspected non-self antigens (new 
strings). 



2.2 Learning 

The process of T-cells maturation in thymus is used as 
an inspiration for learning in AIS. The maturation of 
T-cells (detectors) in thymus is a result of a pseudo- 
random process. After a T-cell is created (see Fig.[T]|, 
it undergoes a censoring process called negative selec- 
tion. During negative selection T-cells that bind self 
are destroyed. Remaining T-cells are introduced into 
the body. The recognition of non-self is then done by 
simply comparing T-cells that survived negative selec- 
tion with a suspected non-self. This process is depicted 
in Fig. |2] It is possible that the self set is incomplete, 
while a T-cell matures (tolerization period) in the thy- 
mus. This could lead to producing T-cells that should 
have been removed from the thymus and can cause an 
autoimmune reaction, i.e. it leads io false positives. 

A deficiency of the negative selection process is that 
alone it is not sufficient for assessing the damage that a 
non-self antigen could cause. For example, many bac- 
teria that enter our body are not harmful, therefore an 
immune reaction is not necessary. T-cells, actors of the 
adaptive immune system, require co-stimulation from 
the innate immune system in order to start acting. The 
innate immune system is able to recognize the pres- 
ence of harmful non-self antigens and tissue damage, 
and signal this to certain actors of the adaptive immune 
system. 

The random-generate-and-test approach for produc- 
ing T-cells (detectors) described above is analyzed 
in ifTTI . In general, the number of candidate detectors 
to the self set size needs to be exponential (if a match- 
ing rule with fixed matching probability is used). An- 
other problem is a consistent underfitting of the non- 
self set; there exist "holes" in the non-self set that are 
undetectable. In theory, for some matching rules, the 
number of holes can be very unfavorable ESl . In prac- 
tical terms, the effect of holes depends on the charac- 
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teristics of the non-self set, representation and match- 
ing rule 11151 . The advantage of this algorithm is its 
simplicity and good experimental results in cases when 
the number of detectors to be produced is fixed and 
small 126). A review of other approaches to detector 
computation can be found in |i2J. 

3 Sensor Networks 

A sensor network can be defined in graph theoretic 
framework as follows: a sensor network is a net N ~ 
{n{t),e{t)) where n{t),e{t) are the set of nodes and 
edges at time t, respectively. Nodes correspond to sen- 
sors that wish to communicate with each other. An edge 
between two nodes A and B is said to exist when A is 
within the radio transmission range of B and vice versa. 
The imposed symmetry of edges is a usual assumption 
of many mainstream protocols. The change in the car- 
dinality of sets n{t),e{t) can be caused by switching 
on/off one of the sensors, failure, malfunction, removal, 
signal propagation, link reliability and other factors. 

Data exchange in a point-to-point (uni-cast) scenario 
usually proceeds as follows: a user initiated data ex- 
change leads to a route query at the network layer of 
the OSI stack. A routing protocol at that layer attempts 
to find a route to the data exchange destination. This 
request may result in a path of non-unit length. This 
means that a data packet in order to reach the desti- 
nation has to rely on successive forwarding by inter- 
mediate nodes on the path. An example of an on- 
demand routing protocol often used in sensor networks 
is DSR |20|. Route search in this protocol is started 
only when a route to a destination is needed. This 
is done by flooding the network with RRECjl control 
packets. The destination node or an intermediate node 
that knows a route to the destination will reply with a 
RREP control packet. This RREP follows the route 
back to the source node and updates routing tables at 
each node that it traverses. A RERR packet is sent to 
the connection originator when a node finds out that the 
next node on the forwarding path is not replaying. 

At the MAC layer of the OSI protocol stack, the 
medium reservation is often contention based. In or- 
der to transmit a data packet, the IEEE 802.11 MAC 
protocol uses carrier sensing with an RTS-CTS-DATA- 
ACK handshake0 Should the medium not be available 

2RREQ = Route Request, RREP = Route Reply, RERR = Route 
Error. 

3RTS = Ready to send, CTS = Clear to send, ACK = Acknowl- 



or the handshake fails, an exponential back-off algo- 
rithm is used. This is combined with a mechanism that 
makes it easier for neighboring nodes to estimate trans- 
mission durations. This is done by exchange of dura- 
tion values and their subsequent storing in a data struc- 
ture known as Network allocation vector (NAV). With 
the goal to save battery power, researchers suggested, 
a sleep-wake-up schedule for nodes would be appropri- 
ate. This means that nodes do not listen continuously 
to the medium, but switch themselves off and wake 
up again after a predetermined period of time. Such 
a sleep and wake-up schedule is similarly to duration 
values exchanged among nodes. An example of a MAC 
protocol, designed specifically for sensor networks, that 
uses such a schedule is the S-MAC |29|. A sleep and 
wake-up schedule can severely limit operation of a node 
in promiscuous mode. In promiscuous mode, a node 
listens to the on-going traffic in the neighborhood and 
collects information from the overheard packets. This 
technique is used e.g. in DSR for improved propaga- 
tion of routing information. 

Movement of nodes can be modeled by means of a 
mobility model. A well-known mobility model is the 
Random waypoint model 1201 . In this model, nodes 
move from the current position to a new randomly gen- 
erated position at a predetermined speed. After reach- 
ing the new destination a new random position is com- 
puted. Nodes pause at the current position for a time 
period t before moving to the new random position. 

For more information on sensor networks, we refer 
the reader to 11211 . 



4 Summary of Results 

Motivated by the positive results reported in rT7','26l we 
have undertaken a detailed performance study of AIS 
with focus on sensor networks. The general conclusions 
that can be drawn from the study presented in this doc- 
ument are: 

1 . Given the ranges of input parameters that we used 
and considering the computational capabilities of cur- 
rent sensor devices, we conclude that AIS based misbe- 
havior detection offers a decent detection rate. 

2. One of the main challenges in designing well per- 
forming AIS for sensor networks is the set of "genes". 
This is similar to observations made in 12411 . 



edgment. 
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3. Our results suggest that to increase the detection 
performance, an AIS should benefit from information 
available at all layers of the OSI protocol stack; this 
includes also detection performance with regards to a 
simplistic flavor of misbehavior such as packet drop- 
ping. This supports ideas shortly discussed in |30| 
where the authors suggest that information available at 
the application layer deserves more attention. 

4. We observed that somewhat surprisingly a gene 
based purely on the MAC layer significantly con- 
tributed to the overall detection performance. This 
gene poses less limitations when a MAC protocol with 
a sleep-wake-up schedule such as the S-MAC [29l is 
used. 

5. It is desirable to use genes that are "complemen- 
tary " with respect to each other. We demonstrated that 
two genes, one that measures correct forwarding of data 
packets, and the other one that indirectly measures the 
medium contention, have exactly this property. 

6. We only used a single instance of learning and de- 
tection mechanism per node. This is different from ap- 
proach used in ifTTl l26l . where one instance was used 
for each of m possible neighbors. Our performance 
results show that the approach in IITtI |26| may not be 
feasible for sensor networks. It may allow for an easy 
Sybil attack and, in general, m — n—1 instances might 
be necessary, where n is the total number of sensors in 
the network. Instead, we suggest that flagging a node as 
misbehaving should, if possible, be based on detection 
at several nodes. 

7. Only less than 5% detectors were used in detecting 
misbehavior. This suggests that many of the detectors 
do not comply with constraints imposed by the commu- 
nications protocols; this is an important fact when de- 
signing AIS for sensor networks because the memory 
capacity at sensors is expected to be very limited. 

8. The data traffic properties seem not to impact the 
performance. This is demonstrated by similar detection 
performance, when data traffic is modeled as constant 
bit rate and Poisson distributed data packet stream, re- 
spectively. 

9. We were unable to distinguish between nodes that 
misbehave (e.g. deliberately drop data packets) and 
nodes with a behavior resembling a misbehavior (e.g. 
drop data packets due to medium contention). This mo- 
tivates the use of danger signals as described in fT, "76 ] . 
The approach applied in [26 1 does, however, not com- 
pletely fit sensor networks since these might implement 
only a simplified version of the transport layer 



5 AIS for Sensor Networks: De- 
sign Principles 

In our approach, each node produces and maintains its 
own set of detectors. This means that we applied a di- 
rect one-to-one mapping between a human body with 
a thymus and a node. We represent self, non-self and 
detector strings as bit-strings. The matching rule em- 
ployed is the r-contiguous bits matching rule. Two bit- 
strings of equal length match under the r-contiguous 
matching rule if there exists a substring of length r at 
position p in each of them and these substrings are iden- 
tical. Detectors are produced by the process shown in 
Fig-Hi i-e. by means of negative selection when detec- 
tors are created randomly and tested against a set of self 
strings. 

Each antigen consists of several genes. Genes are 
performance measures that a node can acquire locally 
without the help from another node. In practical terms 
this means that an antigen consists of x genes; each of 
them encodes a performance measure, averaged in our 
case over a time window. An antigen is then created by 
concatenating the x genes. 

When choosing the correct genes, the choice is lim- 
ited due to the simplified OSI protocol stack of sensors. 
For example, Mica2 sensors fO) using the TinyOS oper- 
ating system do not guarantee any end-to-end connec- 
tion reliability (transport layer), leaving only data traffic 
at the lower layers for consideration. 

Let us assume that the routing protocol finds for a 
connection the path Ss, si, Si, Si+i, Si+2, from 
the source node Sg to the destination node Sd, where 
Ss ^ Sd and s^+i ^ Sd- We have used the following 
genes to capture certain aspects of MAC and routing 
layer traffic information (we averaged over a time pe- 
riod (window size) of 500 seconds): 

MAC Layer: 

#1 Ratio of complete MAC layer handshakes between 
nodes Si and s^+i and RTS packets sent by to 
Si+i. If there is no traffic between two nodes this 
ratio is set to oo (a large number). This ratio is av- 
eraged over a time period. A complete handshake 
is defined as a completed sequence of RTS, CTS, 
DATA, ACK packets between Si and s^+i. 

#2 Ratio of data packets sent from Si to s^+i and then 
subsequently forwarded by Si+i to Sj+2- If there 
is no traffic between two nodes this ratio is set to 
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oo (a large number). This ratio is computed by 
Si in promiscuous mode and, as in the previous 
case, averaged over a time period. This gene was 
adapted from the watchdog idea in |25 1. 

#3 Time delay that a data packet spends at Si+i be- 
fore being forwarded to Si+2- The time delay is 
observed by Si in promiscuous mode. If there is 
no traffic between two nodes the time delay is set 
to zero. This measure is averaged over a time pe- 
riod. This gene is a quantitative extension of the 
previous gene. 

Routing Layer: 

#4 The same ratio as in #2 but computed separately 
for RERR routing packets. 

#5 The same delay as in #3 but computed separately 
for RERR routing packets. 

The Gene #1 can be characterized as MAC layer 
quality oriented - it indirectly measures the medium 
contention level. The remaining genes are watchdog 
oriented. This means that they more strictly fit a cer- 
tain kind of misbehavior. The Gene #2 can help de- 
tect whether packets get correctly forwarded; the Gene 
#3 can help detect whether forwarding of packets does 
not get intentionally delayed. As we will show later, 
in the particular type of misbehavior (packet dropping) 
that we applied, the first two genes come out as "the 
strongest". The disadvantage of the watchdog based 
genes is that due to limited battery power, nodes could 
operate using a sleep-wake-up schedule similar to the 
one used in the S-MAC. This would mean that the node 
Si has to stay awake until the node s^+i (monitored 
node) correctly transmits to Si+2- The consequence 
would be a longer wake-up time and possible restric- 
tions in publishing sleep-wake-up schedules. 

In [241 the authors applied a different a set of genes, 
based only on the DSR routing protocol. The observed 
set of events was the following: A = RREQ sent, B = 
RREP sent, C = RERR sent, D = DATA sent and IP 
source address is not of the monitored (neighboring) 
node, E = RREQ received, F = RREP received, G = 
RERR received, H = DATA received and the IP destina- 
tion address is not of the monitored node. The events D 
and H take into consideration that the source and desti- 
nation nodes of a connection might appear as misbehav- 
ing as they seem to "deliberately" create and delete data 
packets. Then the set of their four genes is as follows: 



#1 Number of E over a time period. 

#2 Number of (E*(A or B)) over a time period. 

#3 Number of H over a time period. 

#4 Number of (H*D) over a time period. 

The time period (window size) in their case was 10s; 
* is the Kleene star operator (zero or more occurrences 
of any event(s) are possible). Similar to our watch- 
dog genes, these genes impose additional requirements 
on MAC protocols such as the S-MAC. Their depen- 
dence on the operation in promiscuous mode is, how- 
ever, more pronounced as a node has to continuously 
observe packet events at all monitored nodes. 

The research in the area of what and to what extent 
can be or should be locally measured at a node, is inde- 
pendent of the learning mechanism used (negative se- 
lection in both cases). Performance of an AIS can partly 
depend on the ordering and the number of used genes. 
Since longer antigens (consisting of more genes) indi- 
rectly imply more candidate detectors, the number of 
genes should be carefully considered. Given x genes, 
it is possible to order them in x\ different ways. In our 
experience, the rules for ordering genes and the number 
of genes can be summed up as follows: 

1) Keep the number of genes small. In our experi- 
ments, we show that with respect to the learning mech- 
anism used and the expected deployment (sensor net- 
works), 2-3 genes are enough for detecting a basic type 
of misbehavior. 

2) Order genes either randomly or use a predeter- 
mined fixed order. Defining a utility relation between 
genes, and ordering genes with respect to it can, in gen- 
eral, lead to problems that are considered intractable. 
Our results however suggest, it is important to under- 
stand relations between different genes, since genes are 
able to complement each other; this can lead to their in- 
creased mutual strength. On the other hand, random or- 
dering adds to robustness of the underlying AIS. For an 
attacker, it is namely more difficult to deceive, since he 
does not know how genes are being used. It is currently 
an open question, how to impose a balanced solution. 

3) Genes cannot be considered in isolation. Our 
experiments show, when a detector matched an anti- 
gen under the r-contiguous matching rule, usually this 
match spanned over several genes. This motivates de- 
sign of matching rules that would not limit matching to 
a few neighboring genes, offer more flexibility but still 
require that a gene remains a partly atomic unit. 
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5.1 Learning and Detection 

Learning and detection is done by applying the mech- 
anisms shown in Figs. [T] and |2] The detection it- 
self is very straightforward. In the learning phase, a 
misbehavior-free period (see |[T1 on possibilities for cir- 
cumventing this problem) is necessary so that nodes get 
a chance to learn what is the normal behavior. When 
implementing the learning phase, the designer gets to 
choose from two possibilities: 

1) Learning and detection at a node get implemented 
for each neighboring node separately. This means that 
different antigens have to get computed for each neigh- 
boring node, detector computation is different for each 
neighboring node and, subsequently, detection is differ- 
ent for each neighboring node. The advantage of this 
approach is that the node is able to directly determine 
which neighboring node misbehaves; the disadvantage 
is that m instances (to is the number of neighbors or 
node degree) of the negative selection mechanism have 
to get executed; this can be computationally prohibitive 
for sensor networks as to can, in general, be equal to the 
total number of sensor. This allows for an easy Sybil 
attack 1 13 1 in which a neighbor would create several 
identities; the node would then be unable to recognize 
that these identities belong to the same neighbor. This 
approach was used in Il26ll24l . 

2) Learning and detection at a node get implemented 
in a single instance for all neighboring nodes. This 
means a node is able to recognize anomaly (misbehav- 
ior) but it may be unable to determine which one from 
the TO neighboring nodes misbehaves. This implies that 
nodes would have to cooperate when detecting a mis- 
behaving node, exchange anomaly information and be 
able to draw a conclusion from the obtained informa- 
tion. An argument for this approach is that in order 
to detect nodes that misbehave in collusion, it might 
be necessary to rely to some extent on information ex- 
change among nodes, thus making this a natural solu- 
tion to the problem. We have used this approach; a post- 
processing phase (using the list of misbehaving nodes) 
was necessary to determine whether a node was cor- 
rectly flagged as misbehaving or not. 

We find the second approach to be more suited for 
wireless sensor networks. It is namely less computa- 
tionally demanding. We are unable, at this time, to esti- 
mate the frequency of a complete detector set computa- 
tion. 

Both approaches can be classified within the four- 
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Figure 3: An four-layer architecture aimed at protecting 
sensor networks against misbehavior and abuse. 



layer architecture (Fig. [3]) that we introduced in [14]. 
The lowermost layer. Data collection and preprocess- 
ing, corresponds to genes' computation and antigen 
construction. The Learning layer corresponds to the 
negative selection process. The next layer. Local and 
co-operative detection, suggests, an AIS should bene- 
fit from both local and cooperative detection. Both our 
setup and the setup described in flE' '24] only apply 
local detection. The uppermost layer. Local and co- 
operative response, implies, an AIS should also have 
the capability to undertake an action against one or sev- 
eral misbehaving nodes; this should be understood in a 
wider context of co-operating wireless devices acting in 
collusion in order to suppress or minimize the adverse 
impact of such misbehavior. To our best knowledge, 
there is currently no AIS implementation for sensor net- 
works taking advantage of this layer 

Which r is the correct one? 

An interesting technical problem is to tune the r pa- 
rameter for the r-contiguous matching rule so that the 
underlying AIS offers good detection and false posi- 
tives rates. One possibility is a lengthy simulation study 
such as this one. Through multiparameter simulation 
we were to able to show that r = 10 offers the best per- 
formance for our setup. In 1 12,| we experimented with 
the idea of "growing" and "shrinking" detectors; this 
idea was motivated by ||T9l . The initial tq for a grow- 
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ing detector can be chosen as tq = where I is 

the detector length. The goal is to find the smallest r 
such that a candidate detector does not match any self 
antigen. This means, initially, a larger (more specific) 
r is chosen; the smallest r that fulfills the above condi- 
tion can be found through binary search. For shrinking 
detectors, the approach is reciprocal. Our goal was to 
show that such growing or shrinking detectors would 
offer a better detection or false positives rate. Short of 
proving this in a statistically significant manner, we ob- 
served that the growing detectors can be used for self 
tuning the r parameter The average r value was close 
to the r determined through simulation (the setup in that 
case was different from the one described in this docu- 
ment). 

5.2 Further Optimizations 

Our experiments show that only a small number of de- 
tectors get ever used (less than 5%). The reason is, they 
get produced in a random way, not considering structure 
of the protocols. For example, a detector that is able to 
detect whether i) data packets got correctly transmitted 
and ii) 100% of all MAC layers handshakes were in- 
complete is superfluous as this case should never hap- 
pen. In fSl, the authors conclude: "... uniform coverage 
of non-self space is not only unnecessary, it is imprac- 
tical; non-self space is too big". Application driven 
knowledge can be used to set up a rule based system that 
would exclude infeasible detectors; see [lOJ for a rule 
based system aimed at improved coverage of the non- 
self set. In fTT], it is suggested that unused detectors 
should get deleted and the lifetime of useful detectors 
should be extended. 

5.3 Misbehavior 

In a companion paper [T3l, we have reviewed different 
types of misbehavior at the MAC, network and transport 
layers of the OSI protocol stack. We note that solutions 
to many of these attacks have been already proposed; 
these are however specific to a given attack. Addition- 
ally, due to the limitations of sensor networks, these so- 
lutions cannot be directly transfered. 

The appeal of AIS based misbehavior detection rests 
on its simplicity and applicability in an environment 
that is extremely computationally and bandwidth lim- 
ited. Misbehavior in sensor networks does not have to 
be executed by sensors themselves; one or several com- 
putationally more powerful platforms (laptops) can be 



used for the attack. On the other hand, a protection 
using such more advanced computational platforms is, 
due to e.g. the need to supply them continuously with 
electric power, harder to imagine. It would also create 
a point of special interest for the possible attackers. 

6 Experimental Setup 

The purpose of our experiments was to show that AIS 
are a viable approach for detecting misbehavior in sen- 
sor networks. Furthermore, we wanted to cast light on 
internal performance of an AIS designed to protect sen- 
sor networks. One of our central goals was to provide 
an in-depth analysis of relative usefulness of genes. 

Definitions of input and output parameters: The in- 
put parameters for our experiments were: r parameter 
for the r-contiguous matching rule, the (desired) num- 
ber of detectors and misbehavior level. Misbehavior 
was modeled as random packet dropping at selected 
nodes. 

The performance (output) measures were arithmetic 
averages and 95% confidence intervals 0195% of detec- 
tion rate, number of false positives, real time to compute 
detectors, data traffic rate at nodes, number of iterations 
to compute detectors (number of random tries), num- 
ber of non-valid detectors, number of different (unique) 
antigens in a run or a time window, and number of 
matches for each gene. The detection rate dr is defined 
as ^2^, where dns is the number of detected non-self 

ns ' 

Strings and ns is the total number of non-self strings. A 
false positive in our definition is a string that is not self 
but can still be a result of anomaly that is identical with 
the effects of a misbehavior. A non-valid detector is a 
candidate detector that matches a self string and must 
therefore be removed. 

The number of matches for each gene was evalu- 
ated using the r-contiguous matching rule; we consid- 
ered two cases: i) two bit-strings get matched from the 
left to the right and the first such a match will get re- 
ported (matching gets interrupted), ii) two bit-strings 
get matched from the left to the right and all possible 
matches will get reported. The time complexity of these 
two approaches is 0{r{l — r)) and 8(r(/ — r)), respec- 
tivelly; r < I, where / is the bitstring length. The first 
approach is exactly what we used when computing the 
real time necessary for negative selection, the second 
approach was used when our goal was to evaluate rela- 
tive usefulness of each gene. 

Scenario description: We wanted to capture "self" 
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(i) Negative selection algorithm: random-generate-and-test. Implemented in C++, compiled with GNU g++ v4.0 with -03 
option. 

(ii) Input parameters: 1. r-contiguous matching rule with r — {7, 10, 13, 16, 19, 22}. 2. Encoding: 5 genes each 10 bits 
long = 50 bits. 3. Number of detectors {500, 1000, 2000, 4000}. 4. Misbehavior level {10, 30, 50%} 5. Window size 500 
seconds; 28 complete windows over 4-hour simulation time. 

(iii) Performance measures: real time to compute detectors, number of iterations to compute detectors, detection rate, 
false positives rate, rate of non-valid detectors, data traffic rate at nodes, number of different antigens in a run, number of 
matches for each gene; their arithmetic averages and 95% confidence intervals (where applicable). 

(iv) Network topology: Snapshot of movement modeled by random waypoint mobility model i.e. it is a static network. 
There were 1,718 nodes. The area was a square of 2,900m x 2,950m. The transmission range of transceivers was 100 
meters. 

(v) Number of connections: 10 CBR (constant bit rate) connections. MAC protocol: IEEE 802.11b DCF. Routing pro- 
tocol: DSR. Other parameters: (i) Propagation path-loss model: two ray (ii) Channel frequency: 2.4 GHz (iii) Topography: 
Line-of-sight (iv) Radio type: Accnoise (v) Network protocol: IPv4 (vi) Connection type: UDP 

(vi) Injection rate: 1 packet/second. 14,400 packets per connection were injected. Packet size was 512 bytes. 

(vii) The number of independent simulation runs for each combination of input parameters was 20. The simulation time 
was 4 hours. 

(viii) Simulator used: GlomoSim 2.03; hardware used: 30x Linux (SuSE 10.0) PC with 2GB RAM and Pentium 4 3GHz 
microprocessor. 



Figure 4: Parameters used in the experiment. 



and "non-self" packet traffic in a large enough synthetic 
static sensor network and test whether using an AIS we 
are able to recognize non-self, i.e. misbehavior. 

The topology of this network was determined by 
making a snapshot of 1,718 mobile nodes (each 
with 100m radio radius) moving in a square area of 
2,900m X 2,950m as prescribed by the random waypoint 
mobility model; see Figure [Sj a). The motivation in us- 
ing this movement model and then creating a snapshot 
are the results in our previous paper [71 that deals with 
structural robustness of sensor network. Our preference 
was to use a slightly bigger network than it might be 
necessary, rather than using a network with unknown 
properties. The computational overhead is negligible; 
simulation real time mainly depends on the number 
of events that require processing. Idle nodes increase 
memory requirements, but memory availability at com- 
puters was in our case not a bottleneck. 

We chose source and destination pairs for each con- 
nection so that several alternative independent routes 
exist; the idea was to benefit from route repair and route 
acquisition mechanisms of the DSR routing protocol, so 
that the added value of AIS based misbehavior detection 
is obvious. 

We used 10 CBR (Constant bit rate) connections. 
The connections were chosen so that their length is ^^7 
hops and so that these connections share some common 
intermediate nodes; see FigurelSjb). For each packet re- 



ceived or sent by a node we have captured the following 
information; IP header type (UDR 802.11 or DSR in 
this case), MAC frame type (RTS, CTS, DATA, ACK in 
the case of 802.1 1), current simulation clock, node ad- 
dress, next hop destination address, data packet source 
and destination address and packet size. 

Encoding of self and non- self antigens: Each of the 
five genes was transformed in a 10-bit signature where 
each bit defines an interval of a gene specific value 
range. We created self and non-self antigen strings by 
concatenation of the defined genes. Each self and non- 
self antigen has therefore a size of 50 bits. The interval 
representation was chosen in order to avoid carry-bits 
(the Gray coding is an alternative solution). 

Constructing the self and non-self sets: We have ran- 
domly chosen 28 non-overlapping 500-second windows 
in our 4-hour simulation. In each 500-second window 
self and non-self antigens are computed for each node. 
This was repeated 20 times for independent Glomosim 
runs. 

Misbehavior modeling: Misbehavior is modeled as 
random data packet dropping (implemented at the net- 
work layer); data packets include both data packets 
from the transport layer as well as routing protocol 
packets, that should get dropped will simply not be in- 

'*The interval encoding of genes i.s adapted from |26|. This way 
only one of the 10 bits is set to 1, i.e. there are only 10 possible value 
levels that it is possible to encode in this case. 
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serted into the IP queue); we have randomly chosen 236 
nodes and these were forced to drop {10, 30, 50%} of 
data packets. However, there were only 3-10 nodes with 
misbehavior and with a statistically significant number 
of packets for forwarding in each simulation run; see 
constraint C2 in Section]?] 

Detection: A neighboring node gets flagged as mis- 
behaving, if a detector from the detector set matches an 
antigen. Since we used a single learning phase, we had 
to complement this process with some routing informa- 
tion analysis. This allowed us to determine, which one 
from the neighboring nodes is actually the misbehav- 
ing one. In the future, we plan to rely on co-operative 
detection in order to replace such a post-analysis. 

Simulation phases: The experiment was done in four 
phases. 

1 . 20 independent Glomosim runs were done for one 
of {10, 30, 50%} misbehavior levels and "normal" 
traffic. Normal means that no misbehavior took 
place. 

2. Self and non-self antigen computation (encoding). 

3. The 20 "normal" traffic runs were used to com- 
pute detectors. Given the 28 windows and 20 runs, 
the sample size was 20x28 = 560, i.e. detectors 
at each node were discriminated against 560 self 
antigens. 



4. Using the runs with {10, 30, 50%} misbehavior 
levels, the process shown in Fig.|2]was used for de- 
tection; we restricted ourselves to nodes that had in 
both the normal and misbehavior traffic at least a 
certain number of data packets to forward (packet 
threshold). 

The experiment was then repeated with different r, 
desired number of detectors and misbehavior level. 

The parameters for this experiment are summarized 
in Fig.lH The injection rate and packet sizes were cho- 
sen in order to comply with usual data rates of sensors 
(e.g. 38.4kbps for Mica2; see |9]). We chose the Glo- 
mosim simulator 1 3 1 over other options (most notably 
ns2) because of its better scaling characteristics L6J and 
our familiarity with the tool. 

7 Results Evaluation 

When evaluating our results we define two additional 
constraints: 

C 1 . We define a node to be detected as misbehaving if 
it gets flagged in at least 14 out of the 28 possible 
windows. This notion indirectly defines the time 
until a node is pronounced to be misbehaving. We 
call this a window threshold. 
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1000 2000 
Desired number of detectors 



1000 2000 
Desired number of detectors 



1000 2000 
Desired number of detectors 



(a) Real time to compute the desired number (b) Rate of non-valid detectors; for r < 13 is (c) Number of iterations needed in order to 
of detectors at a node; cjg5% < 1%. cig5% < 1%, for r > 16 is the sample size compute the desired number of detectors; for 

not significant. r > 10 is £295% < 1%, for r = 7 is 

cig^% < 2%. 



Figure 6; Performance of detectors computation. 



Number of detectors = 2000 Number of detectors - 2000 Number of detectors - 2000 




(a) Detection rate vs packet threshold; conf. (b) Detection rate vs r; cig^% range similar (c) Number of false positives; for r < 10 is 

interval ranges: for mis. level 10% is cig5% to (a). cig^% = 0.47-0.68, for r > 13 is the sample 

= 3.8-19.8%; for 30% is cigg^ = 11.9-15.9%; size not significant, 
for 50% is cjg5% = 11.0-14.2%. 



Figure 7: Performance of misbehavior detection. Misbehavior level = {10, 30, 50}%. In (a) r — 10, in (b) and (c) 
the packet threshold was 1000. 




(a) Total number of runs with window thresh- (b) The number of unique detectors that (c) The number of unique detectors that 
old > 14. matched an antigen in a run. Conf. interval matched an antigen in a window; each run has 

range for 7 < r < 13 is cig^% = 6.5-10.1%. 28 windows. Conf. interval range: cigg^ < 

0.16%. 



Figure 8: Window threshold and detector related performance measures. 
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Figure 9: Antigen and gene related performance measures. 
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Figure 10: Performance of Genes #1 through #5 for the number of detectors = 2000 and (a) r = 7, (b) r = 10, (c) 
r = 13. 
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C2. A node Si has to forward in average at least m 
packets over the 20 runs in both the "normal" and 
misbehavior cases in order to be included into our 
statistics. This constraint was set in order to make 
the detection process more reliable. It is dubious 
to flag a neighboring node of as misbehaving, 
if it is based on "normal" runs or runs with mis- 
behavior, in which node Si had no data packets to 
forward (he was not on a routing path). We call this 
a packet threshold; m was in our simulations cho- 
sen from {500, 1000, 2000, 4000}. Example: for 
a fixed set of input parameters, a node forwarded 
in the "normal" runs in average 1,250 packets and 
in the misbehavior runs (with e.g. level 30%) 
750 packets. The node Si would be considered 
for misbehavior detection if to = 500, but not if 
TO > 1000. In other words, a node has to get a 
chance to learn what is "normal" and then to use 
this knowledge on a non-empty packet stream. 

7.1 Overall Performance 

The results related to computation of detectors are 
shown in Figure |6] In our experiments we have con- 
sidered the desired number of detectors to be max. 
4,000; over this threshold the computational require- 
ments might be too high for current sensor devices. We 
remind the reader, each time the r parameter is incre- 
mented by 1, the number of detectors should double in 
order to make these two cases comparable. 

Figure |6ta) shows the real time needed to com- 
pute the desired set of detectors. We can see the real 
time necessary increases proportionally with the desired 
number of detectors; this complies with the theoretical 
results presented in [TTl. Figure|6fb) shows the percent- 
age of non-valid detectors, i.e. candidate detectors that 
were found to match a self string (see Figure [T]i. This 
result points to where the optimal operation point of an 
AIS might lie with respect to the choice of r parameter 
and the choice of a fixed number of detectors to com- 
pute. We remind the reader, the larger is the r parameter 
the smaller is the probability that a detector will match 
a self string. Therefore overhead connected to choosing 
the r parameter prohibitively small should be consid- 
ered when designing an AIS. Figure|6jc) shows the total 
number of generate-and-test tries needed for computa- 
tion of detector set of a fixed size; the 95% confidence 
interval is less than 2%. 

In Figure [Tja) we show the dependence of detection 
ratio on the packet threshold. We conclude that except 



for some extremely low threshold values (not shown) 
the detection rate stays constant. This figure also shows 
that when misbehavior level was set very low, i.e. 10%, 
the AIS struggled to detect misbehaving nodes. This 
is partly a result of our coarse encoding with only 10 
different levels. 

At the 30 and 50% misbehaving levels the detection 
rate stays solid at about 70-85%. The range of the 95% 
confidence interval of detection rate is 3.8-19.8%. The 
fact that the detection rate did not get closer to 100% 
suggests, either the implemented genes are not suffi- 
cient, detection should be extended to protocols at other 
layers of the OSI protocol stack, a different ordering of 
genes should have been applied or our ten level encod- 
ing was too coarse. It also implicates that watchdog 
based genes (though they perfectly fit the implemented 
misbehavior) should not be used in isolation, and in 
general, that the choice of genes has to be very careful. 

Figure |3b) shows the impact of r on detection rate. 
When r = {7, 10} the AIS performs well, for r > 10 
the detection rate decreases. This is caused by the inad- 
equate numbers of detectors used at higher levels of ?- 
(we limited ourselves to max. 4,000 detectors). 

Figure |7jc) shows the number of false positives. We 
remind that in our definition false positives are both 
nodes that do not drop any packets and nodes that drop 
packets due to other reasons than misbehavior 

In a separate experiment we studied whether the 
4-hour (560 samples) simulation time was enough to 
capture the diversity of the self behavior. This was 
done by trying to detect misbehavior in 20 independent 
misbehavior-free Glomosim runs (different from those 
used to compute detectors). We report that we did not 
observe a single case of an autoimmune reaction. 

7.2 Detailed Performance 

In Fig.lSja) we show the total number of runs in which a 
node was identified as misbehaving. The steep decline 
for values r > 10 (in this and other figures) documents 
that in these cases it was necessary to produce a higher 
number of detectors in order to cover the non-self anti- 
gen space. The higher the r, the higher is the specificity 
of a detector, this means that it is able to match a smaller 
set of non-self antigens. 

In Fig. ISlb) and (c) we show the number of detec- 
tors that got matched during the detection phase (see 
Fig.|2]i. Fig. (b) shows the number of detectors matched 
per run. Fig. (c) shows the number of detectors matched 
per window. Fig. (b) is an upper estimate on the number 
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of unique detectors needed in a single run. Given that 
the total number of detectors was 2,000, there were less 
than 5% detectors that would get used in the detection 
phase. The tight confidence interval^ for the number of 
unique detectors matched per window (see Fig. (c)) is a 
direct consequence of the small variability of antigens 
as shown in Fig.|9|a). 

Fig. 13 a) shows the number of unique antigens that 
were subject to classification into self or non-self. The 
average for r ~ {7, 10} is about 1.5. This fact does 
not directly imply that the variability of the data traffic 
would be inadequate. It is rather a direct consequence 
of our choice of genes and their encoding (we only used 
10 value levels for encoding). Fig.|9lb) shows the num- 
ber of matches between a detector and an antigen in the 
following way. When a detector under the r-contiguous 
matching rule matches only a single gene within an anti- 
gen, we would increment the "single" counter Other- 
wise, we would increment the "multiple" counter It is 
obvious that with increasing r, it gets more and more 
probable that a detector would match more than a sin- 
gle gene. The interesting fact is that the detection rate 
for both r = 7 and r = 10 is about 80% (see Fig.EJa)) 
and that the rate of non-valid detectors is very different 
(see Fig.|6tb)). This means that an interaction between 
genes has positively affected the later performance mea- 
sure, without sacrificing on the former one. This leads 
to a conlusion that genes should not be considered in 
isolation. 

Fig. |9|c) shows the performance of Gene #1. The 
number of matches shows that this gene contributed 
to the overall detection performance of our AIS. 
Figs. [Tot a-c) sum up performance of the five genes for 
different values of r. Again, an interesting fact is the 
contribution of Gene #1 to the overall detection perfor- 
mance. The usefulness of Gene #2 was largely expected 
as this gene was tailored for the kind of misbehavior 
that we implemented. The other three genes came out 
as marginally useful. The importance of the somewhat 
surprising performance of Gene #1 is that it can be com- 
puted in a simplistic way and does not require continu- 
ous operation of a node. 

7.3 The Impact of Data Traffic Pattern 

In an additional experiment, we examined the impact 
of data traffic pattern on the performance. We used 
two different data traffic models: the constant bit rate 

^For practical reasons we show cjg5% only for 7 < r < 13. 



(CBR) and a Poisson distributed data traffic. In many 
scenarios, sensors are expected to take measurements 
in constant intervals and, subsequently, send them out 
for processing. This would create a constant bit rate 
traffic. Poisson distributed traffic could be a result of 
sensors taking measurements in an event-driven fash- 
ion. For example, a sensor would take a measurement 
only when a target object (e.g. a person) happens to be 
in its vicinity. 

The setup for this experiment was similar to that pre- 
sented in Fig. m with the additional fact that the data 
traffic model would now become an input parameter 
With the goal to reduce complexity of the experimen- 
tal setup, we fixed r = 10 and we only considered 
cases with 500 and 2000 detectors. In order to match 
the CBR traffic rate, the Poisson distributed data traffic 
model had a mean arrival expectation of 1 packet per 
second (A — 1.0). As in the case with CBR, we com- 
puted the detection rate and the rate of false positives 
with the associated arithmetic averages and 95% confi- 
dence intervals. 

The results based on these two traffic models were 
similar, actually, we could not find the difference be- 
tween them to be statistically significant. This points 
out that the detection process is robust against some 
variation in data traffic. This conclusion also reflects 
positively on the usefulness of the used genes. More 
importantly, it helped disperse our worries that the re- 
sults presented in this experimental study could be un- 
acceptably data traffic dependent. 

8 Related Work 

In f2^, '24^ the authors introduced an AIS based misbe- 
havior detection system for ad hoc wireless networks. 
They used Glomosim for simulating data traffic, their 
setup was an area of 800 x 600m with 40 mobile nodes 
(speed 1 m/s) of which 5-20 are misbehaving; the rout- 
ing protocol was DSR. Four genes were used to capture 
local behavior at the network layer. The misbehavior 
implemented is a subset of misbehavior introduced in 
this paper; their observed detection rate is about 55%. 
Additionally, a co-stimulation in the form of a danger 
signal was used in order to inform nodes on a forward- 
ing path about misbehavior, thus propagating informa- 
tion about misbehaving nodes around the network. 

In 1. 17,1 the authors describe an AIS able to detect 
anomalies at the transport layer of the OSI protocol 
stack; only a wired TCP/IP network is considered. Self 
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is defined as normal pairwise connections. Each detec- 
tor is represented as a 49-bit string. The pattern match- 
ing is based on r-contiguous bits with a fixed r — 12. 

Ref. II23I discusses a network intrusion system that 
aims at detecting misbehavior by capturing TCP packet 
headers. They report that their AIS is unsuitable for 
detecting anomalies in communication networks. This 
result is questioned in |4| where it is stated that this is 
due to the choice of problem representation and due to 
the choice of matching threshold r for r-contiguous bits 
matching. 

To overcome the deficiencies of the generate-and-test 
approach a different approach is outlined in ll22l . Sev- 
eral signals each having a different function are em- 
ployed in order to detect a specific misbehavior in sen- 
sor wireless networks. Unfortunately, no performance 
analysis was presented and the properties of these sig- 
nals were not evaluated with respect to their misuse. 

The main discerning factor between our work and 
works shortly discussed above is that we carefully con- 
sidered hardware parameters of current sensor devices, 
the set of input parameters was designed in order to 
target specifically sensor networks and our simulation 
setup reflects structural qualities of such networks with 
regards to existence of multiple independent routing 
paths. In comparison to |26l|23 we showed that in case 
of static sensor networks it is reasonable to expect the 
detection rate to be above 80%. 

9 Conclusions and Future Work 

Although we answered some basic question on the suit- 
ability and feasibility of AIS for detecting misbehavior 
in sensor networks a few questions remain open. 

The key question in the design of AIS is the quantity, 
quality and ordering of genes that are used for measur- 
ing behavior at nodes. To answer this question a de- 
tailed formal analysis of communications protocols will 
be needed. The set of genes should be as "complete" as 
possible with respect to any possible misbehavior The 
choice of genes should impose a high degree of sen- 
sor network's survivability defined as the capability of 
a system to fulfill its mission in a timely manner, even in 
the presence of attacks, failures or accidents till . It is 
therefore of paramount importance that the sensor net- 
work's mission is clearly defined and achievable under 
normal operating conditions. 

We showed the influence and usefulness of certain 
genes in order to detect misbehavior and the impact of 



the r parameter on the detection process. In general, the 
results in Fig. [TOl show that Gene #1 and #2 obtained of 
all genes the best results, with Gene #2 showing always 
the best results. The contribution of Gene #1 suggests 
that observing the MAC layer and the ratio of complete 
handshakes to the number of RTS packets sent is useful 
for the implemented misbehaviour 

Gene #2 fits perfectly for the implemented misbehav- 
ior. It therefore comes as no surprise that this gene 
showed the best results in the detection process. The 
question which remains open is whether the two genes 
are still as useful when exposed to different attack pat- 
terns. 

It is currently unclear whether genes that performed 
well with negative selection, will also be appropriate 
for generating different flavors of signals as suggested 
within the danger theory Ill ll6ll . It is our opinion that 
any set of genes, whether used with negative selection 
or for generating any such a signal, should aim at cap- 
turing intrinsic properties of the interaction among dif- 
ferent components of a given sensor network. This con- 
tradicts approaches applied in ll26l l22l where the genes 
are closely coupled with a given protocol. The rea- 
son for this statement is the combined performance of 
Gene #1 and #2. Their interaction can be understood 
as follows: data packet dropping implies less medium 
contention since there are less data packets to get for- 
warded. Less data packets to forward on the other hand 
implies easier access to the medium, i.e. the number 
of complete MAC handshakes should increase. This is 
an interesting complementary relationship since in or- 
der to deceive these two genes, a misbehaving node has 
to appear to be correctly forwarding data packets and, 
at the same time, he should not significantly modify the 
"game" of medium access. 

It is improbable that the misbehaving node alone 
would be able to estimate the impact of dropped packets 
on the contention level. Therefore, he lacks an impor- 
tant feedback mechanism that would allow him to keep 
the contention level unchanged. For that, he would need 
to act in collusion with other nodes. The property of 
complementarity moves the burden of excessive com- 
munication from normally behaving nodes to misbehav- 
ing nodes, thus, exploiting the ad hoc (local) nature of 
sensor networks. Our results thus imply, a "good" mix- 
ture of genes should be able to capture interactions that 
a node is unable to influence when acting alone. It is an 
open question whether there exist other useful proper- 
ties of genes, other than complementarity. 

We conclude that the random-generate-and-test pro- 
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cess, with no knowledge of the used protocols and their 
behavior, creates many detectors which might show to 
be superfluous in detecting misbehavior A process with 
some basic knowledge of protocol limitations might 
lead to improved quality of detectors. 

In 1 28] the authors stated that the random-generate- 
and-test process "is innefficient, since a vast number 
of randomly generated detectors need to be discarded, 
before the required number of the suitable ones are ob- 
tained". Our results show that at r = 10, the rate of 
discarded detectors is less than 4%. Hence, at least in 
our setting we could not confirm the above statement. 
A disturbing fact is, however, that the size of the self set 
in our setting was probably too small in order to justify 
the use of negative selection. A counter-balancing ar- 
gument is here the realistic setup of our simulations and 
a decent detection rate. 

We would like to point out that the Fisher iris and 
biomedical data sets, used in ll28l to argue about the 
apropriateness of negative selection for anomaly detec- 
tion, could be very different from data sets generated by 
our simulations. Our experiments show that anomaly 
(misbehavior) data sets based on sensor networks could 
be in general very sparse. This effect can be due to 
the limiting nature of communications protocols. Since 
the Fisher iris and biomedical data sets were in [28 1 not 
evaluated with respect to some basic properties e.g. de- 
gree of clustering, it is hard to compare our results with 
the results presented therein. 

In order to understand the effects of misbehavior bet- 
ter (e.g. the propagation of certain adverse effects), we 
currently develop a general framework for AIS to be 
used within the JiST/SWANS network simulator 
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